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Heuristic functions are an integral part of MapReduce software, both in 
Apache Hadoop and Spark. If the heuristic function performs badly, the load 
in the reduce part will not be balanced and access times spike. To investigate 
this problem closer, we run an optimal database program with numerous 
different heuristic functions on database. We will leverage the Amazon elastic 
MapReduce framework. The paper investigates on general purpose, 
implementation, and evaluation of heuristic algorithm for generating optimal 


Keywords: database system, checksum, and special heuristic functions. With the analysis, 
we present the corresponding runtime results. For the coding part, the records 
Database ; . . 
eee F counting part is hasty and can only work for local Hadoop part, it can be 
Heur Istic algorithm debugged and optimized for general purpose implement on Hadoop and Spark 
Optimized and turn into an effective performance monitor tool. As mentioned before, 
Security there are strange issue, also the performance of BLAKE2s is unexpectedly 
slow in that it’s widely accepted the performance of BLAKE2s is much better 
than MD5 and SHA256, we would like to figure out why the common-sense 
performance of heuristics is deferent from what we got in distributed 
frameworks. 
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1. INTRODUCTION 

MapReduce is a framework to extract information from large datasets ciently. It has to major 
components: i) mapper which reads, transforms data, and creates key-value pairs and ii) reducer which 
combines multiple mapper outputs. Each mapper and reducer can run on an individual machine. The most 
expensive operation in this setup is transferring data between machines. Thus, the necessary data exchange 
should be kept minimal. Additionally, for the reducer to work correctly, it needs all data which correspond to 
the same key. To address those two problems heuristic functions are used. They ensure that each key has the 
same, shorter heuristic value and these heuristic values are as signed to reducer. The reducer then works on 
their own problem and generate the result for a certain key. Heuristic functions have two core contributions: 
i) ensure that each key produces the same heuristic value and ii) create a uniformly distributed. The latter is 
necessary to evenly distribute the workload. Both Apache Hadoop [1] and Apache Spark [2] are two commonly 
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used MapReduce frameworks. If the heuristic functions are slow or distribute data badly, the execution time 
will potentially increase. A slow heuristic function will add additional computation time at each node and thus 
increase the overall computation time. If a heuristic function does not distribute the data accordingly, multiple 
keys might fall into the same heuristic value and thus increase the load on a single reducer. The parallelism is 
reduced and the execution time is increased. In this paper, we want to give an analysis of common heuristic 
functions if they are used in Apache Hadoop or Spark. For this purpose, we selected 14 functions which 
generate a heuristic value. We use each function in an optimal database example in both Apache Hadoop and 
Spark. Additionally, we use these heuristic functions in the PageRank [3] algorithm on Apache Spark. With 
the resulting execution times, we show how heuristic functions impact the performance of MapReduce 
problems and how they can also influence the performance of machine learning algorithms with Apache Spark. 

Although heuristic algorithms are crucial to Apache Hadoop and Spark, research about how dedicated 
heuristic function perform is hard to find. Thus, the found related work is comparably limited. He et al. [4] did 
investigate on using graphics card processing on Apache Hadoop and mentioned the potential impact of a good 
or bad heuristic function. However, they did not discuss the actual impact of a bad heuristic function. 
Katsoulis [5] does use heuristic functions extensively, i.e. for joins, but again does not discuss any performance 
impacts of heuristic functions. However, they further strengthen the reliance on heuristic functions with their 
research. Bertolucci et al. [6] did discuss big data partitioning in Spark in 2015 but again did not focus on 
heuristic functions. They focused on the difference between dynamic and static partitioning. Kocsis et al. [7] 
proposed a method to repair broken Apache Hadoop heuristic functions but did not discuss the performance 
impact of bad, broken heuristic functions in detail. Their key contribution is an algorithm to automatically fix 
or optimize—heuristic functions to perform better. Ramakrishna et al. [8] describe the importance of good 
heuristic functions for high-performance computers but limit themselves to a theoretical approach. To the best 
of our knowledge, there is no related work which discusses how a heuristic function impacts the overall 
performance of Apache Hadoop or Spark. While there is research using and relying on heuristic functions, we 
could not find any which puts numbers on the actual impact. However, it is commonly accepted that heuristic 
functions do influence the performance of Apache Hadoop and Spark significantly. Nevertheless, a detailed 
analysis was not found. This paper is structured as follows: section 2 presents the implementation details behind 
the analysis, section 3 provides the results of the analysis, and section 4 concludes this paper. 


2. METHOD 
2.1. Heuristic approaches for database 

We selected 14 different heuristic function approaches and divided them into four categories. Our 
defined categories are general purpose heuristic functions, cryptographic heuristic functions, Checksums as 
heuristic functions, and special heuristic functions. The first category contains heuristic functions which are 
used to divide data. They should serve the purpose of Apache Hadoop and Spark perfectly as they are supposed 
to be well balanced between speed and heuristic value distribution. General purpose heuristic functions mostly 
use a limited amount of operations and are not difficult to implement. The second category contains heuristic 
functions for cryptographic purposes. They are less efficient but provide the certainty that there is no 
calculating back from the heuristic value to the original value. One way to ensure this is the festival structure. 
The input is manipulated in multiple rounds and in each round, it is split up into two segments. Those segments 
are exchanged. One of them gets XORed with a set of predefined numbers and the other gets again XORed 
with this result. Through many such iterations both security and distribution are ensured. Although this security 
not necessary for Apache Hadoop and Spark, they are good markers for bad heuristic functions, as they are 
usually very slow. In the third category, common checksum algorithms can be found. They are designed to be 
quick in generating but are created only to detect transmission errors and thus distribution is disregarded. In 
Figure | that such functions tend to have certain values which are more likely the result than others. Their inner 
workings typically rely on a single iteration in which mathematical operations also modulo operations are 
applied in combination with predefined numbers. Thus, their heuristic value distribution might be worse than 
standard heuristic functions, but their throughput is higher. The last category contains special functions which 
have a heuristic like behavior with certain additional properties. 

For each category four representative are chosen. The special heuristic functions category contains 
two functions. An overview of how the individual functions performed on a standard machine is depicted in 
Figure 1. On the left side the throughput of an 11-byte input and on the right side the throughput of a 200 MB 
file. As you can see, some heuristic functions perform better for big inputs and some perform better for small 
inputs. The results are grouped by the categories and the throughput is calculated in Mbps with a single 
3.6 GHz processor on Ubuntu 16.04. Figure 2 shows the distribution of the heuristic functions. It shows how 
many out of 10,000 numbers in text format fall in a single of 128 buckets. If the bar is thinner and lower, it 
represents a better distribution. A scattered plot is a bad distribution. The general-purpose heuristic functions 
have a better distribution than expected. For the cryptographic heuristic functions, SHA256 is a surprising 
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outlier. The others are marginally better than the general-purpose heuristic functions. Both Checksums and 
special heuristic functions have a bias in the distribution which can end up in reducing parallelism. All 
evaluated heuristic functions are implementations which can be found on the internet. This approach reduces 
potential implementation errors. Figures 1(a) and (b) shows the throughput of the different heuristic functions 
on a single 3.6 GHz core clustered by type 11-byte and 200 MB. 
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Figure 1. Throughput of the different heuristic functions on a single 3.6 GHz core clustered by type 
(a) 11-byte and (b) 200 MB 
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Figure 2. The number of entries of resulting heuristic values modulo 128 of 10,000 numbers in text format 


The four leftmost on each side are part of the general-purpose heuristic functions. The next four are 
in the category cryptographic heuristic functions. The following four are cyclic redundancy checks for error 
detection. The last two are in the special heuristic function category. Each dot represents one of the 128 modulo 
buckets. Both general-purpose heuristic functions and cryptographic heuristic functions have a similar 
distribution. Checksums and special heuristics have an expected worse distribution. The 10,000 dot of no 
heuristic is not displayed. 


2.2. General purpose heuristic functions 

The first representative is the default heuristic code () [9] implementation of Java. It is commonly used in 
Java software and implemented through native code. Thus, the implementation is platform dependent but usually 
delivers an output well balanced between speed and distribution. The second heuristic function in the category is the 
Jenkins [10], [11] heuristic. In our test environment, it performs slightly worse than the default implementation of 
heuristic code (). It should serve as a good reference point for how well the default implementation is on the AWS 
setup compared to the local one. As a third heuristic MurmurHeuristic [12], [13] is chosen and xxHeuristic [14], [15] 
as the fourth heuristic. Both are recent developments and focus on the throughput. xxHeuristic is almost capable of 
twice the throughput than the default Java implementation on Ubuntu 16.04. 
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2.3. Cryptographic heuristic functions 

The first chosen representative of this group is BLAKE2s [16], [17]. It is a novel cryptographic 
heuristic function which promises superior speed and safety. Strangely, the implementation we use performs 
worse than the other cryptographic heuristic functions. Nevertheless, we kept it, as it gives the behavior of a 
very slow heuristic function. The second cryptographic heuristic function is Whirlpool [18], [19]. It is a rather 
recent development which promises good performance with superior security. In our test setup, it is the second 
slowest representative in this category. The last two representatives are MD5 [20], [21] and SHA256 [22]. Both 
have a similar and higher throughput than the others in this category. While MD5 is officially insecure, 
SHA256 has not suggested for use anymore. However, both are still commonly used in real world applications, 
which makes them good choices. Since all cryptographic heuristic functions result in more than 32-bit output, 
only the first 32 bit is used in our analysis. 


2.4. Checksums as heuristic functions 

Both Fletcher32 [23], [24] and CRC32 [25], [26] have a similar throughput behavior. They are 
developed specifically for error detection and not for code distribution. However, they do convert an arbitrary 
input to a 32-bit output and thus can function as heuristic functions. With Adler32 [27], [28] we chose a better 
performing checksum algorithm for comparison. Similarly, to the previous ones, it generates 32 bits of output. 
The superior speed is traded for a slightly worse error detection resulting in less distribution. Lastly, we added 
the XOR checksum for comparison. It is the only function, which generates 8-bit output by simply applying 
the XOR operation to every byte of the input. Thus, it has the worst distribution of all but outperforms any 
other function. 


2.5. Special heuristic functions 

Additionally, we added two special heuristic function to the analysis. With SimHeuristic [29], [30] 
we evaluate a heuristic function which produces similar heuristic values for similar input values. Although this 
should not result in an advantage for Apache Hadoop and Spark, we chose it for curiosity purposes. Lastly, we 
added a no heuristic simulation for comparison purposes. This function always returns the same heuristic values 
for arbitrary input. Thus, all the data will be gathered at a single reducer and the load is not balanced at all. 
However, it returns the value instantly and does not need any operations. 


2.6. Apache hadoop and spark 

The source packages of both Apache Hadoop and Spark provide example implementations for the 
word count problem. We modified these samples and use them for our analysis. In the Apache Hadoop case, 
we override the standard org.apache.hadoop.io. text class with our custom implementation. This 
implementation implements a custom heuristicCode() implementation, which applies a selected heuristicing 
algorithm for the input. By overriding the heuristicCode() function of the key, it is automatically applied 
whenever a heuristic code is necessary. For Apache Spark, we create our own class MyString. It acts as a 
wrapper class for strings and overrides the equals() and heuristicCode() functions. The heuristicCode() 
functions selects the heuristic algorithm to use. We use this class for the computations with Apache Spark and 
thus the custom heuristic function is in use wherever possible. 


2.7. PageRank 

The PageRank algorithm is only used in Apache Spark due to its iterative nature. There exists a standard 
implementation in the source code of Apache Spark which we modified and used for this paper. Similarly, to the 
Apache Spark implementation of the optimal database example, we created the class MyString. It acts as a 
wrapper class for strings and overrides the equals() and heuristicCode() functions. The heuristicCode() functions 
selects the heuristic algorithm to use. With some modifications of the provided example, we use the MyString 
class instead of normal strings and thus use the custom heuristic algorithm wherever possible. 


3. RESULTS AND DISCUSSION 

Amazon provides a bunch of effective frameworks like Amazon elastic compute cloud (EC2) and 
Amazon elastic MapReduce (EMR). Normally EC2 provides remote servers for a user to customize, in this 
project, we find that spending a large amount of time setting up a cluster is ineffective since we will run on 
both Hadoop and Spark and our target is the heuristic algorithms. In this case, we select EMR as our remote 
environment. Users can easily edit a configuration of framework and the whole cluster can be fully functional 
within ten minutes. All EMR instances are completely based on Amazon EC2. As commonly known, the file 
system of Hadoop and Spark is Hadoop distributed file system (HDFS), on EMR, you can choose to use the 
traditional HDFS or EMRFS with Amazon S3. Considering the limited funds, our project runs on the traditional 
HDFS. Amazon command line interface (CLD) is the interface for input. For this project, we implement our 
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programs on a cluster consisted of three m4. Large nodes powered by two cores of Intel Xeon E5-2686 v4 or 
E5-2676 v3 with 8 GB memory. The version of Hadoop is 2.7.3 and for Spark is 2.2.0. The number of 
practitioners in Hadoop is set to be 5. However, the number of practitioners in Spark is set automatically to be 
16 and won’t be affected by spark-submit commands, which can better exploit the strongpoint of 
parallelization. For local running we set up Hadoop on two machines to distribute our work, and the features 
of one machine are Intel Core 15-7360U with 4 GB memory and another is Intel Core i5-5257U with 8 GB 
memory. Hadoop is installed in pseudo distributed mode with 5 practitioners and Spark is running in local 
mode with 5 practitioners as well. 

Three datasets are used for word count, first is the air quality index comes from the meteorological 
stations in China [31], the number size is around 200,000, which contains both integers and decimals. Second 
is the lorem ipsum generated from an online website [32], lorem ipsum is a text consists of meaningless words 
generated randomly to minimize the influence of words’ meanings in design field, lorem ipsum does have 
duplicate words and is easy to control the size with accessible generators and is similar to the structure of real 
articles. The size for local test is 20,777,978 bytes and for EMR is 10,388,990 bytes. On local the size is 
doubled to make sure that Hadoop can generate enough practitioners. Last is a 49.4 MB duplicate questions 
file from Quora, which includes over 400,000 question pairs [33] that have the potential to be duplicate, this 
dataset is closer to real work and can better simulate the situations in real tasks. As to the PageRank part, we 
wrote a small program that generates a list of websites with their name’s represented as numbers, followed by 
a random PageRank value. We generated a list of 100,000 websites for EMR and 1,300,000 websites one for 
local test. We took the overall duration time of collective the part on EMR to mimic the practical time elapsed 
in a cluster, and for local, we took the same part of the time from the terminal output to get the most accurate 
result for experimental purpose. The test is first launched on Spark EMR, and for every dataset we run it ten 
times to get the mean value except for some extremely slow heuristic algorithms, only three to five times is 
launched to make sure the slow performance is not an exception. First comes the result of air quality case, 
performances of heuristices running on air quality can be seen in Figure 3. Overall, the algorithms except 
BLAKE?2s have the similar time performances, Adler32 and xxHeuristic can even be a little faster than the 
default heuristicCode in some rounds in such a relatively small dataset. Time consumption from the lorem 
ipsum test is given in Figure 4. The performance of cryptographic heuristices and special functions start to 
show a trend of slowing down especially BLAKE2s and its noteworthy that the performance of xxHeuristic 
even slightly exceeds the one of thus we come to the Quora duplicate questions dataset, the time performance 
is given in Figure 5. This dataset is big enough and the test on BLAKE2s and no heuristic took so long that we 
can’t even get a result, so we simply represent it as N/A. 
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Figure 3. The average time consumption of Spark Figure 4. Time consumption of running on Apache 


word count program in collective part. There’s Spark shows the relatively worst performance of 
no significant performance difference in this case cryptographic heuristics’ and slightly slow 
and even ADLER32 can be slightly faster than performance of special heuristics 


default heuristic function 


We can tell that general heuristics do benefit from their superb throughput and still remain the best, 
while the cryptographic heuristics and special heuristics have been completely unusable for their more than 
twice time, which makes sense due to their small throughput. The Checksums except XOR8 also shows the 
correlation trend but the degree is much lower. Under no heuristic circumstance, all records are delivered to 
the first reducer, so there is little parallelization in this case and the performance is the worst. Questions can be 
raised that XOR8 was supposed to show a faster speed as the throughput level is good but here is the opposite. 
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Time Consumption of Quora Questions on EMR running Spark 
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Figure 5. The obvious difference between heuristic types and generally shows a reasonable negative 
correlation with their throughput level but not completely fitted 


The inappropriate correlation level in general purpose heuristics also indicates us that besides 
throughput, more reasons can be affecting the performance of heuristics. So, we collected the number of records 
each practitioner receives to see how records are partitioned and distributed to reducers. We took the standard 
deviation of data to demonstrate the distribution level and as the duty of heuristics is to let out records as evenly 
as possible, so here higher distribution level represents a worse job. However, when we analyze the Spark 
distribution result, we are surprised to find Adler32 and Fletcher32 function strangely on Spark with large 
datasets that the total records generated are more than they are supposed to be, but when we verify this situation 
with small datasets and on Hadoop, they never happened. Nevertheless, comparing with other results, the result 
Figure 6 still shows the same pattern and we can still take this as a reference. 


Distribution Conditions of Quora Questions on EMR = Spark 
48k+ 55k+ 


738.95 


528.43 
Ut wh 


671.21 


579.20 
535.19 
467.79 


Hash Algorithms 


538.31 559-58 
456.01 


~ Tl ll 
o 


Figure 6. Distribution conditions taken from standard deviation of records each reducer received 


8 


b 
Ko 


2 & 
3 = 
yY P 


Standard Deviation of Records each Reducer Received 


K 
$ 


General heuristics with higher throughput unluckily have relatively worse distribution and it’s 
interesting that the heuristics with better throughput right have the worse distribution, this drags down the 
overall speed and as a result, the difference between general purpose heuristics are neutralized. The 
cryptographic heuristics are greatly limited by throughput, so the distribution won’t make them better. 
Checksums performances are hard to predict according to the Figure 2, the records are both likely to be 
scattered evenly or not. In this case looks like Fletcher32 shows a scattered pattern while Adler32 is acceptable 
and Adler32’s better throughput made it better than Fletcher. Special heuristics shows such a bad distribution 
that we should avoid implementing them in practitioner. For the local environment, our data illustrate the 
similar result Figures 7 and 8, and the significant difference is from Checksums, Adler, and Fletcher seems to 
be steady this time but XOR8 turns to be extremely bad. For this time, we can get the distribution of records 
of BLAKE2s, which is quite acceptable, but the little throughput still became its bottleneck and showed an 
unsatisfactory outcome. Hadoop's results are better able to describe the actual performances of heuristics 
because the problem only occurred on Spark and not on Hadoop. As the results from EMR are similar and 
running in pseudo mode can reduce the unpredictable factors on a distributed cluster as much as possible, e.g., 
for Hadoop, we will primarily discuss the results running on local machines. 
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Time Consumption of Quora on Local running Spark 
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Figure 7. Time consumption on local Spark shows the similar pattern as on a remote cluster on EMR which 
proves the running mode of Spark won’t change the performance of heuristic algorithms 


Distribution Conditions of Quora Questions on Local running Spark 
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Figure 8. Distribution of local Spark is able to get the data of BLAKE2S, which proves the cryptographic 
heuristics ‘performances are dominated by throughput regardless of the tolerable spread 


The lag times in network communication between EMR instances. The performance of XOR8 is the 
only notable difference between the tests on Spark and Hadoop, which is completely expected given the 
sporadically well-distributed records. On Hadoop, the previously concluded results are still consistent with the 
majority of the heuristics. The Hadoop results demonstrate the consistency of the heuristics' operation in both 
frameworks. Figure 9 shows performance of XOR, which can serve as an excellent example of how distribution 
affects task speed, demonstrates the main difference between Spark and Hadoop. Figure 10 shows distribution 
conditions of XOR8 are greatly improved this time, and combining with its outstanding throughput. 
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Figure 9. The main discrepancy between Spark and Hadoop is the performance of XOR, which can be an 
excellent proof of the impact of distribution on the task speed 
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Distribution Conditions of Quora Questions on Local running Hadoop 
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Figure 10. Distribution conditions of XOR8 are greatly improved this time, and combining with its 
outstanding throughput 


PageRank is an iterative algorithm that only accepts pairs of integers as input and operates entirely 
differently from word count. Figures 11 and 12 present separate information gathered from EMR and local 
sources. We received the same time commitments and dispersed records from Hadoop. The independence of 
heuristic algorithms is demonstrated, and the specifics of the tasks in these two programs have no bearing on 
the performance of heuristic functions because their responsibility is to map records from appropriate mappers 
to reducers. They only need to be concerned with the uniformity and speed. 
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Figure 11. Results of PageRank from EMR have the same features and correlations as the previous results 
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Time and Distribution Conditions of PageRank on Local 
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Figure 12. CRC displays poor distribution, slowing speed compared to EMR, but small throughput mitigates 
speed decrease 


4. CONCLUSION 

The research has 4 heuristic algorithms divided into 4 groups are tested in this project and unluckily 
only heuristic has a slight potential of replacing the default algorithms when running with small datasets. We 
can tell from the data that general purpose heuristics always have the best performance, highly predictable, and 
similar to each other. Replacing with general purpose heuristics is a safe way if necessary. The cryptographic 
heuristics only can compare with others under small datasets, their unsatisfactory throughputs are bottlenecks 
for the speed, thus as the name said, they are better used for encrypting jobs and completely not your choice 
for a replacement in parallel systems. Checksums generally possess the worst uniformity of distribution. As a 
result, their performances are unpredictable and can sometimes cost heavily. They are also not recommended 
for distributing records. Special groups are mainly set for reference and comparison. They are unusable in most 
case, however, their capability can help us understand how a bad heuristic or even no heuristic would influence 
the system, which emphasizes the importance of heuristic algorithms in the cloud computing. Basing on the 
data collected about throughput and distribution. 

We can also conclude that the performance of heuristics in the partitioned in greatly dominated by 
these two factors and usually the impact from throughput is the foundation of the distribution and in a mass 
decides how it functions inside practitioner. In this program we tested a limited number of heuristic algorithms 
try to find out the relationship between the performance and the type of heuristics. Only when in small datasets 
we do find some heuristics such as functions could have the potential of replacing the default heuristic function 
to provide better performance. But there’re still more functions worth our implement and test. We do believe 
that after relatively mature tests, we can provide a guidebook for when and where to implement which kind of 
heuristics to avoid blindly trial and error when the default heuristic is not suitable under specific circumstances, 
but that would require a lot of time and effort to collect and test the commonly used heuristics. We do find the 
correlations between the performance of tasks and the throughput and uniformity of heuristic algorithms, but 
these are inferred from small data-sample and we even encounter with some strange behaviors about Adler32 
and Fletcher32 on Spark with large datasets. So, we still need more tests with various datasets types and sizes 
to prove our conclusion. The correlations between performance, throughput, and distribution are qualitative 
rather than quantitative. It would well if we can find some quantitative relation between the three factors and 
that can be attempted by implementing machine learning since we do have a bunch of original data. 
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